A Framework for Diagnosing Changes in Evolving Data Streams
نویسنده
چکیده
ABSTRACT In recent years, the progress in hardware technology has made it possible for organizations to store and record large streams of transactional data. This results in databases which grow without limit at a rapid rate. This data can often show important changes in trends over time. In such cases, it is useful to understand, visualize and diagnose the evolution of these trends. When the data streams are fast and continuous, it becomes important to analyze and predict the trends quickly in online fashion. In this paper, we discuss the concept of velocity density estimation, a technique used to understand, visualize and determine trends in the evolution of fast data streams. We show how to use velocity density estimation in order to create both temporal velocity pro les and spatial velocity pro les at periodic instants in time. These pro les are then used in order to predict three kinds of data evolution: dissolution, coagulation and shift. Methods are proposed to visualize the changing data trends in a single online scan of the data stream, and a computational requirement which is linear in the number of data points. In addition, batch processing techniques are proposed in order to identify combinations of dimensions which show the greatest amount of global evolution. The techniques discussed in this paper can be easily extended to spatio-temporal data, changes in data snapshots at xed instances in time, or any other data which has a temporal component during its evolution.
منابع مشابه
An Intuitive Framework for Understanding Changes in Evolving Data Streams
Many organizations today store large streams of transactional data in real time. This data can often show important changes in trends over time. In many commercial applications, it may be valuable to provide the user with an understanding of the nature of changes occuring over time in the data stream. In this poster, we discuss the process of analysis of the significant changes and trends in da...
متن کاملIBLStreams: a system for instance-based classification and regression on data streams
This paper presents an approach to learning on data streams called IBLStreams. More specifically, we introduce the main methodological concepts underlying this approach and discuss its implementation under the MOA software framework. IBLStreams is an instance-based algorithm that can be applied to classification and regression problems. In comparison to model-based methods for learning on data ...
متن کاملClassification of encrypted traffic for applications based on statistical features
Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...
متن کاملPriority Setting Meets Multiple Streams: A Match to Be Further Examined?; Comment on “Introducing New Priority Setting and Resource Allocation Processes in a Canadian Healthcare Organization: A Case Study Analysis Informed by Multiple Streams Theory”
With demand for health services continuing to grow as populations age and new technologies emerge to meet health needs, healthcare policy-makers are under constant pressure to set priorities, ie, to make choices about the health services that can and cannot be funded within available resources. In a recent paper, Smith et al apply an influential policy studies framework – Kingdon’s multiple str...
متن کاملCollaborative Filtering in Dynamic Streaming Environments
The increasing expansion of websites and their web usage necessitates increasingly scalable techniques for Web usage mining that can be better cast within the framework of mining evolving data streams [1, 5]. Despite recent developments in mining evolving Web clickstreams [3, 6], there has not been any investigation of the performance of collaborative filtering [2] in the demanding environment ...
متن کامل